129 research outputs found

    A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The success achieved by genome-wide association (GWA) studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability.</p> <p>Results</p> <p>V-Bay provides a novel solution to the computational scaling constraints of most multiple locus methods and can complete a simultaneous analysis of a million genetic markers in a few hours, when using a desktop. Using a range of simulated genetic and GWA experimental scenarios, we demonstrate that V-Bay is highly accurate, and reliably identifies associations that are too weak to be discovered by single-marker testing approaches. V-Bay can also outperform a multiple locus analysis method based on the lasso, which has similar scaling properties for large numbers of genetic markers. For demonstration purposes, we also use V-Bay to confirm associations with gene expression in cell lines derived from the Phase II individuals of HapMap.</p> <p>Conclusions</p> <p>V-Bay is a versatile, fast, and accurate multiple locus GWA analysis tool for the practitioner interested in identifying weaker associations without high false positive rates.</p

    Mouse obesity network reconstruction with a variational Bayes algorithm to employ aggressive false positive control

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We propose a novel variational Bayes network reconstruction algorithm to extract the most relevant disease factors from high-throughput genomic data-sets. Our algorithm is the only scalable method for regularized network recovery that employs Bayesian model averaging and that can internally estimate an appropriate level of sparsity to ensure few false positives enter the model without the need for cross-validation or a model selection criterion. We use our algorithm to characterize the effect of genetic markers and liver gene expression traits on mouse obesity related phenotypes, including weight, cholesterol, glucose, and free fatty acid levels, in an experiment previously used for discovery and validation of network connections: an F2 intercross between the C57BL/6 J and C3H/HeJ mouse strains, where apolipoprotein E is null on the background.</p> <p>Results</p> <p>We identified eleven genes, Gch1, Zfp69, Dlgap1, Gna14, Yy1, Gabarapl1, Folr2, Fdft1, Cnr2, Slc24a3, and Ccl19, and a quantitative trait locus directly connected to weight, glucose, cholesterol, or free fatty acid levels in our network. None of these genes were identified by other network analyses of this mouse intercross data-set, but all have been previously associated with obesity or related pathologies in independent studies. In addition, through both simulations and data analysis we demonstrate that our algorithm achieves superior performance in terms of power and type I error control than other network recovery algorithms that use the lasso and have bounds on type I error control.</p> <p>Conclusions</p> <p>Our final network contains 118 previously associated and novel genes affecting weight, cholesterol, glucose, and free fatty acid levels that are excellent obesity risk candidates.</p

    Transcriptomic stratification of late-onset Alzheimer\u27s cases reveals novel genetic modifiers of disease pathology.

    Get PDF
    Late-Onset Alzheimer\u27s disease (LOAD) is a common, complex genetic disorder well-known for its heterogeneous pathology. The genetic heterogeneity underlying common, complex diseases poses a major challenge for targeted therapies and the identification of novel disease-associated variants. Case-control approaches are often limited to examining a specific outcome in a group of heterogenous patients with different clinical characteristics. Here, we developed a novel approach to define relevant transcriptomic endophenotypes and stratify decedents based on molecular profiles in three independent human LOAD cohorts. By integrating post-mortem brain gene co-expression data from 2114 human samples with LOAD, we developed a novel quantitative, composite phenotype that can better account for the heterogeneity in genetic architecture underlying the disease. We used iterative weighted gene co-expression network analysis (WGCNA) to reduce data dimensionality and to isolate gene sets that are highly co-expressed within disease subtypes and represent specific molecular pathways. We then performed single variant association testing using whole genome-sequencing data for the novel composite phenotype in order to identify genetic loci that contribute to disease heterogeneity. Distinct LOAD subtypes were identified for all three study cohorts (two in ROSMAP, three in Mayo Clinic, and two in Mount Sinai Brain Bank). Single variant association analysis identified a genome-wide significant variant in TMEM106B (p-value \u3c 5×10-8, rs1990620G) in the ROSMAP cohort that confers protection from the inflammatory LOAD subtype. Taken together, our novel approach can be used to stratify LOAD into distinct molecular subtypes based on affected disease pathways

    Identifying and ranking potential driver genes of Alzheimer\u27s disease using multiview evidence aggregation.

    Get PDF
    MOTIVATION: Late onset Alzheimer\u27s disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types. RESULTS: We demonstrate the utility of our machine learning algorithm on two benchmark multiview datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimer\u27s. We show that our ranked genes show a significant enrichment for single nucleotide polymorphisms associated with Alzheimer\u27s and are enriched in pathways that have been previously associated with the disease. AVAILABILITY AND IMPLEMENTATION: Source code and link to all feature sets is available at https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking

    Broadly Sampled Multigene Trees of Eukaryotes

    Get PDF
    Background. Our understanding of the eukaryotic tree of life and the tremendous diversity of microbial eukaryotes is in flux as additional genes and diverse taxa are sampled for molecular analyses. Despite instability in many analyses, there is an increasing trend to classify eukaryotic diversity into six major supergroups: the \u27Amoebozoa\u27, \u27Chromalveolata\u27, \u27Excavata\u27, \u27Opisthokonta\u27, \u27Plantae\u27, and \u27Rhizaria\u27. Previous molecular analyses have often suffered from either a broad taxon sampling using only single-gene data or have used multigene data with a limited sample of taxa. This study has two major aims: (1) to place taxa represented by 72 sequences, 61 of which have not been characterized previously, onto a well-sampled multigene genealogy, and (2) to evaluate the support for the six putative supergroups using two taxon-rich data sets and a variety of phylogenetic approaches. Results. The inferred trees reveal strong support for many clades that also have defining ultrastructural or molecular characters. In contrast, we find limited to no support for most of the putative supergroups as only the \u27Opisthokonta\u27 receive strong support in our analyses. The supergroup \u27Amoebozoa\u27 has only moderate support, whereas the \u27Chromalveolata\u27, \u27Excavata\u27, \u27Plantae\u27, and \u27Rhizaria\u27 receive very limited or no support. Conclusion. Our analytical approach substantiates the power of increased taxon sampling in placing diverse eukaryotic lineages within well-supported clades. At the same time, this study indicates that the six supergroup hypothesis of higher-level eukaryotic classification is likely premature. The use of a taxon-rich data set with 105 lineages, which still includes only a small fraction of the diversity of microbial eukaryotes, fails to resolve deeper phylogenetic relationships and reveals no support for four of the six proposed supergroups. Our analyses provide a point of departure for future taxon- and gene-rich analyses of the eukaryotic tree of life, which will be critical for resolving their phylogenetic interrelationships

    Molecular estimation of neurodegeneration pseudotime in older brains.

    Get PDF
    The temporal molecular changes that lead to disease onset and progression in Alzheimer\u27s disease (AD) are still unknown. Here we develop a temporal model for these unobserved molecular changes with a manifold learning method applied to RNA-Seq data collected from human postmortem brain samples collected within the ROS/MAP and Mayo Clinic RNA-Seq studies. We define an ordering across samples based on their similarity in gene expression and use this ordering to estimate the molecular disease stage-or disease pseudotime-for each sample. Disease pseudotime is strongly concordant with the burden of tau (Braak score, P = 1.0 × 10-5), Aβ (CERAD score, P = 1.8 × 10-5), and cognitive diagnosis (P = 3.5 × 10-7) of late-onset (LO) AD. Early stage disease pseudotime samples are enriched for controls and show changes in basic cellular functions. Late stage disease pseudotime samples are enriched for late stage AD cases and show changes in neuroinflammation and amyloid pathologic processes. We also identify a set of late stage pseudotime samples that are controls and show changes in genes enriched for protein trafficking, splicing, regulation of apoptosis, and prevention of amyloid cleavage pathways. In summary, we present a method for ordering patients along a trajectory of LOAD disease progression from brain transcriptomic data

    Transfer learning-trained convolutional neural networks identify novel MRI biomarkers of Alzheimer\u27s disease progression.

    Get PDF
    Introduction: Genome-wide association studies (GWAS) for late onset Alzheimer\u27s disease (AD) may miss genetic variants relevant for delineating disease stages when using clinically defined case/control as a phenotype due to its loose definition and heterogeneity. Methods: We use a transfer learning technique to train three-dimensional convolutional neural network (CNN) models based on structural magnetic resonance imaging (MRI) from the screening stage in the Alzheimer\u27s Disease Neuroimaging Initiative consortium to derive image features that reflect AD progression. Results: CNN-derived image phenotypes are significantly associated with fasting metabolites related to early lipid metabolic changes as well as insulin resistance and with genetic variants mapped to candidate genes enriched for amyloid beta degradation, tau phosphorylation, calcium ion binding-dependent synaptic loss, Discussion: This is the first attempt to show that non-invasive MRI biomarkers are linked to AD progression characteristics, reinforcing their use in early AD diagnosis and monitoring

    Large-scale proteomic analysis of human brain identifies proteins associated with cognitive trajectory in advanced age

    Get PDF
    In advanced age, some individuals maintain a stable cognitive trajectory while others experience a rapid decline. Such variation in cognitive trajectory is only partially explained by traditional neurodegenerative pathologies. Hence, to identify new processes underlying variation in cognitive trajectory, we perform an unbiased proteome-wide association study of cognitive trajectory in a discovery (n = 104) and replication cohort (n = 39) of initially cognitively unimpaired, longitudinally assessed older-adult brain donors. We find 579 proteins associated with cognitive trajectory after meta-analysis. Notably, we present evidence for increased neuronal mitochondrial activities in cognitive stability regardless of the burden of traditional neuropathologies. Furthermore, we provide additional evidence for increased synaptic abundance and decreased inflammation and apoptosis in cognitive stability. Importantly, we nominate proteins associated with cognitive trajectory, particularly the 38 proteins that act independently of neuropathologies and are also hub proteins of protein co-expression networks, as promising targets for future mechanistic studies of cognitive trajectory.Accelerating Medicine Partnership for AD [U01AG046161, U01 AG061357]; Emory Alzheimer's Disease Research Center [P50 AG025688]; NINDS Emory Neuroscience Core [P30 NS055077]; intramural program of the National Institute on Aging (NIA); Alzheimer's Association; Alzheimer's Research UK; Michael J. Fox Foundation for Parkinson's Research; Weston Brain Institute Biomarkers Across Neurodegenerative Diseases Grant [11060]; National Institute of Neurological Disorders and Stroke [U24 NS072026]; National Institute on Aging [P30 AG19610]; Arizona Department of Health Services [211002]; Arizona Biomedical Research Commission [4001, 0011, 05-901, 1001]; [R01 AG056533]; [R01 AG053960]; [U01 MH115484]; [I01 BX003853]; [IK2 BX001820]; [R01 AG061800]; [R01 AG057911]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    A novel systems biology approach to evaluate mouse models of late-onset Alzheimer\u27s disease.

    Get PDF
    BACKGROUND: Late-onset Alzheimer\u27s disease (LOAD) is the most common form of dementia worldwide. To date, animal models of Alzheimer\u27s have focused on rare familial mutations, due to a lack of frank neuropathology from models based on common disease genes. Recent multi-cohort studies of postmortem human brain transcriptomes have identified a set of 30 gene co-expression modules associated with LOAD, providing a molecular catalog of relevant endophenotypes. RESULTS: This resource enables precise gene-based alignment between new animal models and human molecular signatures of disease. Here, we describe a new resource to efficiently screen mouse models for LOAD relevance. A new NanoString nCounterÂŽ Mouse AD panel was designed to correlate key human disease processes and pathways with mRNA from mouse brains. Analysis of the 5xFAD mouse, a widely used amyloid pathology model, and three mouse models based on LOAD genetics carrying APOE4 and TREM2*R47H alleles demonstrated overlaps with distinct human AD modules that, in turn, were functionally enriched in key disease-associated pathways. Comprehensive comparison with full transcriptome data from same-sample RNA-Seq showed strong correlation between gene expression changes independent of experimental platform. CONCLUSIONS: Taken together, we show that the nCounter Mouse AD panel offers a rapid, cost-effective and highly reproducible approach to assess disease relevance of potential LOAD mouse models
    • …
    corecore